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Abstract 

It was recently shown that for reasonable notions of approximation 
of states and functions by quantum circuits, almost all states and func- 
tions are exponentially hard to approximate [5]. The bounds obtained are 
asymptotically tight except for the one based on total variation distance 
(TVD) . TVD is the most relevant metric for the performance of a quantum 
circuit. In this paper we obtain asymptotically tight bounds for TVD. We 
show that in a natural sense, almost all states are hard to approximate to 
within a TVD of 2/e — e even for exponentially small e. The quantity 2/e 
is asymptotically the average distance to the uniform distribution. Almost 
all states with probability amplitudes concentrated in a small fraction of 
the space are hard to approximate to within a TVD of 2 — e. These re- 
sults imply that non-uniform quantum circuit complexity is non-trivial in 
any reasonable model. They also reinforce the notion that the relative 
information distance between states (which is based on the difficulty of 
transforming one state to another) fully reflects the dimensionality of the 
space of qubits, not the number of qubits. 

1 Introduction 

Given two probability distributions v and fj, on a finite event space, the total 
varation distance (TVD) between v and fj, is defined by \v — = J2 X IK 2 -) — 
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fJ,(x)\. The TVD on an event space with n elements is equivalent to the L\ 
metric on 

A(n) = {x£R" | x > 0, x • 1 = 1}, 

where R is the set of real numbers, the expression x > means that each 
coordinate X{ of x satisfies X{ > 0, y • z is the inner product of y and z, and 
(1) is the vector with all entries 0(1). 

Our interest in the TVD comes from the theory of quantum circuits. 
The domain H n of computation of a quantum circuit is an ra-fold tensor product 
of qubits Q, H n = Q® n . A qubit Q is the two dimensional complex Hilbert 
space generated by the basis vectors |0) and |1). The standard basis of H n 
consists of elements of the form |&i)|&2) • • -\b n ) which we abbreviate |&i&2 • • -b n ) 
or I b) if b G 2M is an ra-bit vector or a number written in binary (in reverse 
order). A quantum circuit applies a unitary operation to H n by composing a 
number of primitive unitary operations called quantum gates. A quantum gate 
with g inputs is a unitary operator V on H g . The specification of the circuit 
describes which g qubits each gate should act on. The gate's action is obtained 
by identifying H n with H g (x) H n _ g , where H g is the factor corresponding to the 
g input qubits. The gate acts as V ® I on H g (x) H n _ g , where I is the identity 
matrix. The complexity of the quantum circuit is the number of gates applied. 
An important property is that all unitary operations are exactly representable 
as compositions of 2- qubit gates [4]. See [8, 7, 2] for more detailed descriptions 
and motivations. 

The output of a quantum circuit is a state in H n . The knowledge that 
can be gained from a state is restricted to what can be learned by measuring 
it. A measurement on the first m bits of a state \x) induces a probability 
distribution on ra-bit vectors defined by 

Prob m (6 I \x)) = Y J \(bb'\x)\ 2 

b' 

for b G 2H The goal of a computation is to transform an input state \b) to some 
output state \x(b)) whose induced distribution Prob m (- | \x(b))) is sufficiently 
close to a desired one. Since we are comparing probability distributions, the 
TVD is the most appropriate distance measure to use for evaluating the success 
of the computation. 

It is shown in [5] that even if |0) is the only input of interest, almost 
no distribution on 2^ can be approximated within a TVD of | — e unless the 
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number of gates in the circuit is exponentially large in m. The notion of "al- 
most no distribution" is derived from the induced Lebesgue measure on A(2 m ). 
This result is suboptimal in two ways. First one can compute the minimum 
expected distance of x £ A(n) from a fixed point in A(n) as | — o(l) > |. Thus 
the situation where a small number of gates can approximate a distribution to 
better than average is not excluded. Second, most computationally interesting 
distributions are highly concentrated. Such distributions are on average within 
a TVD 2 — o(l) of other distributions. Finding an approximation within dis- 
tance 1 (say) might already be good. For approximating functions with large 
domains, the results for classical approximation problems in [5] show that even 
weak approximation is difficult. However, for small domains, the worst-case 
complexity of approximating highly concentrated distributions to within 2 — e 
total variation distance was left open. In this paper we resolve both of these 
issues by showing that the number of gates must be nearly exponential for any 
non-trivial approximation to be achieved for a non-negligible fraction of possible 
input-output relationships. 

Our proofs are based on the same arguments as those given in [5], and 
use lemmas given there. The new results in this paper are obtained by making 
use of a large deviation argument to show that random elements of A(n) have 
certain properties with respect to the TVD. 

2 Main results 

We begin with some definitions. 

For N' < N, there are (^,) ways of embedding A(N') in A(iV). For 
an N' tuple S C [n], let A(S,N) be the face of A(iV) consisting of the vectors 
x which satisfy that x t > iff i £ S. Let A(N',N) = Us : |s|=iV' A(S,N). Note 
that A(N,N) = A(N) and for N' < N", A(N',N) C A(N",N). 

Let A(N',N) k be the set of A;-tuples of members of A(N',N). We 
endow A(N',N) k with the measure fj, obtained by normalizing the Lebesgue 
measure so that fj,( A(N\ N) k ) = 1. This is the natural uniform distribution 
on A(iV', N) k . The Lebesgue measure is denoted by v. In general we will use 
fj, to denote the uniform distribution and v the induced Lebesgue measure on 
the polytope of interest. If necessary, we will subscript fj, by the polytope being 
considered. When using probabilistic concepts defined on a polytope, we always 
mean the uniform distribution. 
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We extend the TVD to A(N) k by 

l x - y|i = ~ 

which is the average TVD of the components. Here x 4 - denotes the i'th member 
of the fc-tuple of elements x of A(N) k . 

A unitary operator U acting on H n induces a map which takes the 
first k basis elements |0), . . ., \k — 1) to the fc-tuple of probability distributions 
Prob m (- | C/|0)),...,Prob m (- | U\k - 1)) in A(2 m ) k . Denote this jfe-tuple by 
<t>m(U). 

Let G g b ' n ^ be the set of unitary operators on n qubits expressible as 
a composition of at most b (/-input quantum gates. Let X(b,g,m,n;N,d,k) 
consist of the members of A(iV, 2 m ) k which are within average TVD d of an 
element of (f> m (G^g ,n ^). The number of inputs g is assumed to be constant in the 
discussions below. 

Note that for the purpose of bounding X(b, g, m, n; N, d, k) from above 
we can assume that b > (n — m)/g. Otherwise some input qubits which do not 
participate in the final measurement are involved in the computation and may 
be eliminated. To avoid other trivial cases we assume that b > n > m > 1, 
N > 2 and k > 1. 

Theorem 2.1 There exist constants C{ > such that for < e < 2, 
lii(n A{2m)k (X(b,g,m,n;2 m ,2/e- e,k))) < 2 Cl9 b ln(26/e) + c 2 mk - c 3 e 2 2 m k. 

Lemma 3.1 shows that 2/e — o(l) is the average distance of y £ A(2 m ) 
to the uniform distribution l/2 m . 

The proof of the theorem can be used to find explicit values of the 
constants 1 . We do not make any attempts to optimize the inequalities in this 
paper. 

Corollary 2.2 For < a < 1, almost all k-tuples of states require 2 Qfm ( 1_0 ( 1 )) 
g-input gates for approximation by a quantum circuit on the first m qubits to 
within a TVD of 2/e- 2"( 1 -«W 2 . 

Theorem 2.3 There exist constants C{ > such that for N = j2 m and < e < 

]n -(f l A(N,2 m ) k ( X ( b ^ 9, m, n; N, 2 - e, k) j) 

< 2 Clfl 61n(26/e) + c 2 mk - ((c 3 e - c 4 7 1/4 )72 m A;. 

1 These values turn out not to be excessively large or small. 
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Corollary 2.4 Let 3/4 < a < 1 and j = o(2 _4 ( 1_Qf ) m ). Consider those k-tuples 
of states \x) which satisfy that Prob m (- | \x)) has at most j2' m non-zero values. 
Then almost all such k-tuples of states require at least ^2 a ' m ( 1 ~°( 1 ^k g-input 
gates for approximation by a quantum circuit on the first m qubits to within a 
TVD of 2- 2-(!-«) m . 

Proofs of Theorems 2.1 and 2.3. The proofs of the theorems closely follow 
those given for Theorems 4.4 and 4.5 in [5]. We outline of the proofs, deferring 
the proofs of the lemmas to the following section. 

First note that if we represent a unitary operator by the composition 
of b fixed gates, we have at most (p 6 choices for ways of composing them. This 
gives a bound on the number of structurally distinct quantum circuits. The next 
observation is that the group of unitary operators on g qubits can be densely 
covered using a constant (for fixed g) number of operators. This is formalized 
by Lemma 4.4 of [5] which we state next. For any linear operator U, let ||£7||2 
denote the two-norm of U defined by 

\\U\\2 = max \Ux\. 

x:|x| = l 

Lemma 2.5 There exists a subset U 3j s of G g with no more than (2/6) el- 
ements such that for every V G G g there exists a U G U 9t $ satisfying that 

\\u - v\\ 2 < s. 

The lemma's relevance to the problem at hand is due to the relationship 
between the two-norm and the TVD, and the behavior of the two-norm under 
composition of unitary operators. The two-norm satisfies 

|Prob(- | U\b)) - Prob(- | < 2\\U - V\\ 2 

(Lemma 2.2 of [5]) and for unitary operators JJ{ and Vi 

\\UiU2 - V 1 V 2 \\ 2 < ||^i - Vl|| 2 + 11^2 - V2W2 

(Lemma 2.3 of [5]). 

Let d = 2/e — e for Theorem 2.1 and d = 2 — e for Theorem 2.3. Let 
B x (d) = {y I |x — y |i < d}. Let X = X(b, g, m, n; N, d, k). Then X is included 
in the union of the balls -B 9 i m ((7)((l + a)d), where U ranges over the unitary 
operators defined by those circuits of at most b elements for which each gate is 
in U g 
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Choose a = e/(2d). First consider the statement of Theorem 2.1. By 
Theorem 3.11 there are constants c 4 - > such that 

ln( MA(JVj2m) ,^ m([/) ((l + a)d) n A(JV,2 m ) fc )) < -( Cl e 2 2 m - c 2 m)k, 

which implies 

fi(X) < Q 6 (46/(arf)) 2496 e-( c ^ 22m - c ^) fc , 

ln(fi(X)) < 6(ln(n) + 2 4g ln(4&/(W))) - c x e 2 2 m k + c 2 mk 
< 2 C3fl 61n(26/e) - c 1 e 2 2 m k + c 2 rak. 

This proves Theorem 2.1. 

To prove Theorem 2.3, we can proceed in a similar fashion. Let 
-y = N/2 m . By Theorem 3.12 there are constants ci, c 2 and C3 such that for 
sufficiently large to, 

In /x(^ m(t/) ((l + a)d) n A(JV, 2 m ) fc ) < -( Cl e - c 27 1 / 4 ) 7 2 m £ ; + c 3 toA;. 

Hence 

M (X) < Q 6 (46/(arf)) 2496 e -( Cl£ - C2 "' 1/4 )"' 2mfc + C3mfc 
ln(/x(X)) < 2 C3g bln(2b/e) + c 3 mk-(c 1 €-C2j 1/3 )j2 m k. 



3 Large Deviation Bounds For Total Variation Distance 

For the remainder of the paper we assume that N > 2. 

3.1 The Expectation of |x — y|i 

For fixed x, let 

D(x,N',N) = y^A(JV',JV)(y)|x-y|i. 

be the expected TVD of x from elements of A(N',N). Write _D(x, iV) 
D(x,N,N). 

Lemma 3.1 For x > 0, D(x, N) = J2 t ^(1 ~ x t ) N + E 8 x t - 1. 
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Proof. We have |x — y|i = J2i \ x i ~ Vi\i so by additivity of expectations, we 
can consider each coordinate separately. The induced density function of the 
distribution of yi is (N — 1)(1 — t) N ~ 2 . The contribution of the i'th coordinate 
to D(x,N) is 

1 dt(N -l)\t- Xi \(l-t) N - 2 

o 

N-2 , fAT i\ r +\N-2 



(N - 1) y <ft(a; t - - t)(l - ty"-' + (N - 1) / <ft(i - a: t -)(l - if 



We have 



(N-l)j d( Xi - t)(l - tf- 2 = (i(l - t) - ( Xi - t))(l - tf- 1 + C, 

so that the contribution of the first coordinate is 

2 1 
Exp(> 8 - yi \) = — (1 - Xi f - (— - a;,-). 



Corollary 3.2 D(1/N,N) = 2/e - 0(1/N). 

Lemma 3.3 For x > 0, £>(x, iV', iV) = ^(1 - a;,-)^' + £ 8 «i - 1- 

Proof. This is a direct application of Lemma 3.1. The contribution of X{ is 
1 ' ^ 



( N, I E Exp(|^-^| :y G A(S,N))+ ]T X A 

\N'> \S:ieS S:i£S ) 



Lemma 3.4 For x £ A(N), _D(x, iV', iV) is minimized by x = 1/iV. 

Proof. Note that _D(x, iV', iV) is convex in x. By symmetry, the minimum 
must be achieved by x = 1/N. ■ 
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3.2 On the Distribution of |x — y | : y £ A(N) 
Let 

T(x,d,N',N) = MA(JV',JV)(y I |x-y|i<cf), 
T(d,N',N) = T(l/N,d,N',N), 
T(x,d,N) = T(x,d,N,N), 
T(d,N) = T(l/N,d,N). 

We would like to obtain good upper bounds on T(x,d,N) for x £ A(iV) and 
d=2/e- e. 

Theorem 3.5 There exist constants C{ > such that for any x £ A(iV) 
T(x,2/e-e,N) < e -^ 2 JV+c 2 in(iV). 

The proof of the theorem requires several lemmas. First we simplify 
the problem to the case of x = 1/N. 

Lemma 3.6 For x such that x • 1 = aN , T(x, d, N) is maximized by x = al. 

Proof. The proof is by induction on N. For N = 2, the result follows by 
inspection. Let N > 2. Let x = (xi,x') with x' £ R w_1 . We have 

/x(y £ A(iV) | |x-y|i < d) 

= fi((yi,y') e A(N) \ \x r - Vl \ + |x' - y'| < d) 

= [ dt(N - 1)(1 - t) N ~ 2 
Jo 

M(i-t)A(JV-i)(y' G (1 - i)A(iV - 1) I |x' - y'| < d - \ Xl - t\), 

where (1 — t)A(N — 1) = {y > | y • 1 = (1 — /)}. In the last step we used the 
fact that the distribution of y\ has density (iV — 1)(1 — t) N ~ 2 . By induction and 
scaling, the integrand is maximized by x' = (x' • 1)1 /(N — 1) independently of 
yi and /. Note that replacing x' by (x'- 1)1/(N — 1) does not change x- 1. This 
implies that the probability of interest is maximized if every subset of N — 1 
coordinates of x is uniform, which is satisfied only by x = al. ■ 

To obtain a bound on T(d, N) requires decomposing A(iV) according to 
which orthant y — 1/N belongs to. Formally, let A^(iV) be the set of y £ A(iV) 
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such that exactly the first k coordinates of y — 1/N are positive. Then A(iV) 
is a disjoint union of coordinate permuted copies of the A^(iV). In particular 



We will make use of the tail probabilities of the sums of N — 1 inde- 
pendent identically distributed uniform random variables. We define them here 
in the language of polytopes. Let [0, 1] = {x £ R | < x < 1} and 



Lemma 3.7 There exist constants a 4 - > and < kq < 1 such that for \k/N — 
k \ > e, 



Proof. Note that Ao(iV) has measure zero, so we can assume that k > 0. 
Let Xk = Afc(iV) — 1/N. We project X k onto the last N — 1 coordinates and 
consider its measure in the set S = {z | z > —1/N, z ■ 1 < 0}. The volume of 
this set is 1/(N — 1)!. The projection of an element of Xj~ can be written as 
(y,z) with y £ R fc_1 and z £ TL N ~ k corresponding to the positive and negative 
coordinates, respectively. 



li A{N) (A k (N)) = ns(X k ) 

= (JV-l)!//((y,z) | y £R fc "\ y > 0, 

z £ R^-*, 1/N > z > 0, z • 1 > y • 1) 
= (JV - l)l/N N - 1 //((y,z) | y £ R fc "\ y > 0, 

z£[0,l] JV - fc ,z-l>yl) 
= (N - ly./N*- 1 //((y,z) | y £ R fc "\ y > 0, 

z£[0,l] JV " fc , N-k>(y,z)-l), 



where we scaled by N in the second step and obtained the last identity by 
replacing z with z — 1. The volume in the last expression can be decomposed 
according to which translate of the standard hypercube y is in. We label these 
translates by the coordinates of the corner nearest the origin and note that by 




U(M,s) = fi [0tl]M {x.e [0,1] 



M 



x • 1 < s). 
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symmetry, only the sum of these coordinates is relevant. This gives 

MA(iV)(A fc (iV)) = (JV - l)!/^" 1 Xf (* ^2 2 ) 

Ky G [OAf-'lN -k-l>y 1) 
= (A - 1)!/A^ £ (* 2 2 ) ^ " * " * " °" 

Let 

C(k, /) = (A - iy./N N -i Q + _!_" 2 )[/(A - 1, A - * - /). 
We next show that there exist 6 4 - > 0, < kq < 1 and < < 1 such that for 

(\k/N - K \ 2 + \l/N - e | 2 ) 1/2 > €, C(k,l) < e -bie 2 N+b 2 ln(N)_ g y 

summing over 

/, this implies that for \k/N — kq\ > e 



-6 ie 2]v+(6 2 +l)ln(Ar) 

k — ' " 



which gives the lemma. 

To prove the desired property of C(k, /), consider the functions 

/(«,e,JV) = HC([kN\,^N\))/N, 
/(«,£) = Um/(«,e,iV), 

with domain < k < 1 and < k + £ < 1. Since the sum of the C(k,l) is 
1, A) < 0. Let H e (x) = — xln(x) — (1 — s)ln(l — x) be the information 

function base e. Then for some constant b 2 , 

/(«, ^ iV) < -l + J ff e (K) + (K + J ffe(«/(« + 0)+M^(^-l, N(l-K-0))/N + b 2 hl(N)/N, 

where we applied Lemma A. 3 and Stirling's approximation. The term b 2 ln(A)/A 
accounts for the polynomial factors in Stirling's approximation of (A — 1)1 as 
well as the correction for integer rounding in Lemma A. 3. By Theorem A. 7, 
r(x) = — Y\m n ln(U(n, x))/n is convex (where it is finite) and identically for 
x > 1/2. In addition ln(U(n,xn))/n < —r(x). Hence 

/(«,£, ^) < U*,Z,N) = def b 2 ln(N)/N -l + H e ( K ) 

+ (« + i)H e (Kl(K + 0) - (A - 1)/A r((l -K- i)(Nl(N - 1))), 

/(«, = "I + #e(«) + (« + 0#e(«/(« + 0) -T(l-K- 0- 
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To show that /(k, £) is strictly concave, we evaluate the Hessian of g(n, £) 
/(«,£) + r(l-«"0- Note that (k + 0#e(«/(« + 0) = -«ln(«) " £M0 

(« + 0M« + 0- 

d K g(K,0 = ln(l-K)-21n( K )+ln( K + 



<%<9 k #(k,£) 



1 2 1 



1 — K K K + £ 

i £ 



k(1 — k) k(k + £) 

— K 



1 



K + £ 

The Hessian of g is therefore given by 



— k) 

1 -K 



Thus the diagonal elements of the Hessian are strictly negative for < £ < oo 
and < k < 1. Its determinant is given by 

, , S , 1 



V «(!-«) «(«+£) ; («+C) 2 £(]_ _ K )( K + £) - 

This is strictly positive on the domain, which implies strict concavity of h(n, £) 
and hence of /(k, £). The function /(k, £) therefore has a unique maximum. The 
value at the maximum is by the asymptotic lower bounds of Theorem A. 7 and 
the fact that C(k,l) < 1. Let kq and £o be the location of the maximum of 
/(k,£). Since /(k,£) = — oo on the boundary of its domain, the maximum 
occurs in the interior. The concavity and differentiability properties imply that 
there exists b 3 > such that if ((k - k ) 2 + (£ - £o) 2 ) 1 ^ 2 > e, then /(k,£) < 
— b^e 2 (this can be shown formally by use of the multidimensional Taylor series 
expansion with the remainder and applying strict concavity and boundedness 
of the domain). Choose b\ small enough and b 2 large enough to compensate 
for the differences in the arguments of r in /„(«,£, N) and /(k,£). This gives 
fu(K,Z,N)< -b^ 2 + b 2 ln(N)/N. U 

Lemma 3.7 allows us to consider only those (^)Ak(N,a/N) with k 
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near the the maximizing value. Define 

T k (d,N] 



^MA(JV)(y G A fc (JV) I |l/JV-y|!<d). 



To estimate T^, we will study its density function T' k (d,N) = -^Tk(t, N)\t=d- 
Note that Tk(t,N) is differentiable. 

Lemma 3.8 Let kq be as in Lemma 3.7. There exist constants b{ > 0, <*>o > 
and a function < c?(k) < 2 such that for \k — kq\ < So and \d — c?(k)| > e, 
T' [KNi (d,N) < e -he 2 N+b 2 HN)^ The j unction d ^ can 

be chosen to be continu- 
ously differentiable on its domain. 

Proof. By using the first part of the proof of Lemma 3.7, we can write T k as 
follows: 



T k (d,N) = (^(N-iy./N*- 1 



z/((y,z) | y G R fc_1 , y > 0, 
z G [0,1]^-*, 2z • 1 < Nd, 
y 1 <z l) 



Differentiating by d gives 



U(d,N) = ^ N k j(N -iy./N N - 2 (Nd^Kk-iy. 

U'(N - k,Nd/2). 

Consider k = \_kN J and define 

t(d) = ]imHT' KN (d,N))/N. 

We proceed as in the proof of Lemma 3.7 and use Theorem A. 7 to obtain: 

t(d) = H e (K) - 1 + (1 - K)h'(d/(2(1 - k))) 
+ Km(c?/2) + k — 
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where h'(x) = linijv ln(£7'(A, x))/N is strictly concave. It is clear that t(d) is 
strictly concave in d, with a negative second derivative where it is finite. Hence t 
has a unique maximum for some d = at which it must be 0. It follows that 

there is a constant 63 > such that for, /(c?(k) — e) < e~ b3t2N . Since h'(x) = — 00 
for x > 1, and the second derivative of ln(c?/2) is strictly bounded above by c < 
for d/2 < 1, we can choose 63 independently of k. The derivative ddt is strictly 
monotone in d for each k in the domain and is continuously differentiable in both 
k and d (using Theorem A. 7 for h'). Thus is defined by c^(c?(k)) = 0. By 
implicit differentiation, ddd K td K d + d\t = 0. By strict concavity and continuity 
of the functions involved, d K d is well defined with a continuous derivative. 

To obtain the bound of the lemma, it now suffices to apply (5) of 
Theorem A. 7 and Stirling's approximation. Note that < kq < 1 so that the 
term r'(x) in (5) is bounded for 80 small enough. ■ 

We are now ready to give the proof of Theorem 3.5. 
Proof of Theorem 3.5. The quantity 2/e is asymptotically the average dis- 
tance of elements of A(iV) to 1/N. Let kq and a 4 - be as in the statement 
of Lemma 3.7 and c?(k), 6; and 80 as in the statement of Lemma 3.8. Let 
d = d(Ko). Choose C3 such that \do — d(Ko + /)| < C3 1/| for all / < <5o- 

We claim that do = 2/e. The results so far imply that the distribution 
of \1/N — y|i is strongly concentrated at its average as N — ► 00, which implies 
the result. More specifically, to see that do < 2/e + o(l), consider for 8 /{2c^) < So 

T(do-S,N) < J2 T k (d -6,N)+ J2 T k (2,N) 

k:\k/N-n \<S/(2cs) k:\k/N-n |>5/(2c 3 ) 

<- e -6i (S/2) 2 N+b' 2 ln(JV) + e -ai (8 / (2c 3 )) 2 N+a' 2 ln(JV) 

where ai and 62 have been adjusted to absorb factors of N and 2 from the 
summation and integration of T' k . 

Let d a be the average value of \1/N — y\\. The above inequalities imply 

that 

d a > (do - S)(l - e -^ 2 iV(i+ (i))). 

A reverse inequality is obtained similarly and the claim follows by letting N — ► 
00 and 8 0. 

Replacing 8 by e in the inequalities above and choosing C2 large enough 
gives the theorem, provided that e/{2c^) < 80. One can extend the result to all 
e by noting that only the case e < 2 is non-trivial and choosing c\ small enough 
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and C2 large enough to cover the remaining range by exploiting monotonicity of 
T(do — e, N) for e/{2c^) > So and N large enough. ■ 



3.3 On the Distribution of \x - y \ : y e A(iV', TV) 

Consider A(N',N) with N' = o(N). We would like to show that for all x G 
A(iV), most elements of A(iV', N) have distance at least 2 — e. 

Theorem 3.9 There exists constants C{ > swc/i i/mi /or < 7 < 1 

MA( b iVj,iV)(y I |x - y| < 2 - e) < e-(— ^h^H"). 

Proof. We assume without loss of generality that |_7^J = 7^ (the correction 
to the exponent on the righthand side can be absorbed by the C3in(iV) term). 
Fix N and let 8 and p be positive constants with properties to be determined. 
Let x G A(iV). Define 

i(x) = {i I Xi<6/N}, 

B(S) = A(S,JV)n{y I |x-y|! < 2 - e} 

with \S\ = jN. Our goal is to show that for most S, f-iA(s,N){B{S)) is small. 
To do so requires another lemma on the distribution of the TVD. 

Lemma 3.10 Let z = (z^\z^) with z^) G R fc and z( 2 ) G K N ~ k . Then for 
k = [kN\ , 



MA(jv)(y 



z 



y| < |z^ . i| + |l - z ( 2 ) . 1 1 — e ) < e («IM«/e)l-£(i-«)/2)JV_ 



Proof. For y G A(N), write y = (y (1) ,y (2) ) with yW G R fc and y( 2 ) G R JV_fc . 
Let w»i = y^ • 1. We have 

|z-y|i > IzW-l-yW-ll + lz^-l-yW-ll 

= |z (1) • 1 - w t \ + |1 - w t - z (2) • 1| 
> (z^ 1 ) • 1| + |1 - z^ 2 ) • 1| - 2wi. 

It follows that 

MA(JV)(y I |z- y|i < |z (1) • 1| + |1 -z^ 2 ) • 1| - e) < //(y | wi>f/2). 
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The distribution of w\ for y in A(iV) is that of a (3 distribution: 

/K) = (iv-i)^; 1 2 ^ 1 fc - i (i- Wl ) w - fc - 1 . 

We can estimate 

My I > f/2) < /' d(jv -^f^-fVi- i)"-*- 1 




where we used Lemma A. 3 and its corollary. ■ 

Suppose that \S fl L(x)\ = (1 — £,)\S\ = (1 — £)jN. We can estimate 
(J>A(S,N)(B(S)) with the help of Lemma 3.10 by projecting x on the coordinates 
in S and considering the coordinates in S \ L(x) versus those in S fl L(x) . 
Let w\, W2 and be the total weight of the coordinates of x in S \ L(x), 
S fl i(x) and the complement of S, respectively. We have wi < (1 — ^)<*>7 
and w\ + W3 > 1 — (1 — £)#7- The distance parameter in Lemma 3.10 relative 
to S partitioned into S \ L(x) and S fl L(x) is given by w\ + 1 — W2. The 
distance of x to an element of D(S,N) due to the coordinates outside of S is 
W3. We have w\ + 1 — wi + W3 > 2 — e/2, provided that £7 < e/4. Write 
a(£) = -£| ln(£/e)|) + e(l - 0/4. Lemma 3.10 implies that 

MA(W^)) < e~<^' N . 

Let a = a(p) with < p < 1. Since a(£) is decreasing in £, we have Pa(S,N)(B(S )) < 
e -aiN provided that \ S n L((x))\ > \S\ - \p\S\] and £7 < e/4. 

We estimate the fraction of subsets S satisfying | Sf]L(x) | < 15*1— [pl^l] • 
Note that (N - \L(x)\)6/N < 1, so that \L(x)\/N > 1 - 1/6. If 1/6 < p we can 
apply Lemma A. 6 and monotonicity of K e to obtain 

\{S I \S\ = 7A, \Sf]L(x)\ < \S\ - \p\S\]}\ 

< „ N (»-im)(iN/6\\ 
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< I N L--R"e(p,l/%A r +ln(7JV) 
" \ 7 N) 

< e (-pln(p5)-l/5+p) 7 iV+ln(7iV) 

< e -ln(p5/e)p 7 iV+ln(7iV)_ 



Combining these results we get 



-( e (l-p)/4-p|ln(p/e)|) 7 JV 



MA( 7 JV,JV)(y I |x-y|<2-e) < e" 

+ e -ln(5p/e))p 7 iV+ln(7iV) 5 

for < p < 1, <Sp > 1 and £7 < e/4. 

A rough estimate can be obtained by letting p = e/(16|/n(e/(16e))|). 
and 6 = e/(4 7 ). We will assume that 7 < (e/(16e 2 ))e 2 /(64|/n(e/(16e))|). This 
is true for 7 < c' 2 e 4 for some constant c' 2 . Recall that without loss of generality 
e < 2. Thus ln(8e) < |/ra(e/(16e))| < 16e/e. 

e(l-/o)/4 > 15e/64, 

|/n(/o/e)| = |/ra(e/(16e))-ln|/ra(e/(16e))|| 

< 2|/ra(e/(16e))| 

p\ln(p/e)\ < e/8 

M/otf/e) > I ln(e/(16e)))| 

phi(p6/e) > e/16. 

The inequality of the theorem follows. ■ 
3.4 Extensions of the bounds to A(N',N) k 

It is now straightforward to obtain general bounds for A(N',N) k by using 
Lemma A.l. 

Theorem 3.11 There exist C{ > such that for 1 < k < N and x £ A(N) k 
t l A(N) k (y I |x-y| 1 <2/ e - e )< e -( c - 2w +-Mivm 

Proof. Theorem 3.5 and Lemma A.l with m = 2k give 



2k - 1 



MA(jv)*(y I |x-y|i<0 < (e 



ci e 2 /4 JV+c 2 ln(JV)\fc 



\ k - 1 

< fi- 
le 



c£ e 2 JVfc+41n(JV)fc 



for suitable choices of constants. 



Theorem 3.12 There exist C{ > such that for < 7 < 1 



MA( 7 ;v,;v)*(y I |x - y| < 2 - e) < e 



(c 1 e-c 2 ~/ 1/i )~/Nk+c 3 ]n(N)k 



Proof. Follow the proof of Theorem 3.11, using Theorem 3.9 and Lemma A.l 



A Appendix 

A.l Miscellaneous Bounds 

We begin by giving several lemmas which are special cases of weak large devia- 
tion laws. 



Proof. Let m > n. Consider x £ R n such that x • 1 > nt. If y is the vector 
with coordinates yi = [_Xim/nt\nt/m, then y • 1 > nt(l — n/m). It follows 
that for each such x, there is an integer vector 1 such that 1 • 1 = m — n and 
Xnt/m < x. Define cr(li) = U for Z 4 - > and cr(li) = —00 otherwise. Using the 
assumption that A(0) = 0, we can estimate 



with m = 2k. 



Lemma A.l Let m be probability distributions on R and = niLi Mi- Sup- 
pose that m(x I x > t) < e _A W , with X(t) convex (where finite) and for t < 
A(i) = 0. Then for m > n, 




H (n) {yi I x • 1 > nt) < 



E 



/ii(x I x > o-(l)) 



l:leZ™, 1>0, \-l=m-n 



< 



E 



e 



l:leZ™, l>0,M=m-n 



Convexity of A implies that 



n 



^A(/ 8 rai/m) < raA(^/^/m). 



i=l i 



17 



This gives 



H {n Hx \ X ■ 1 > Ut) < e -n\(t(l-n/m)) 

IdeZ", l>0,l-l=m-n 



Let H e (K) = — — (1 — k)1ii(1 — k). This is the information 
function base e. 

Lemma A. 2 For < k < 1, H e (K) < k\ ln(K/e)| . 

Proof. The summand —(1 — k) ln(l — k) is concave with a slope of 1 at k = 0. 

■ 

Lemma A. 3 For n > 1 and < k < 1, 

M < e H e ([_ K n\/n)n 

V L Kra J / 



< e -ffe(i)ra+ln(era) 



and lim n ln (|_ Kri j)/n = H e (n). 

Proof. For nn integral, it can be shown that (^) < e He ^ n by applying a 
tight form of Stirling's approximation, for example, 

V2^n~(n/e) n e 1 /( 12n + 1 ) < n! < V2^(n/ef e 1 ^ 12 ^. 

This form of Stirling's approximation can be found in [6]. For non-integral kti it 
suffices to observe that \H e (n) — H e ( [Kn\ /n)\ < H e (-^). The result then follows 
by Lemma A. 2. ■ 

Corollary A.4 For < k < 1, ( L ™ n j) < e K IM«/e)IK 
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Proof. Let k' = \_kti\ jn. By Lemma A. 3 we have 



< e K 'l ln ( K '/ e )l n 

< e K l ln ( K / e )l»\ 



For < k < 1 andO < £ < 1, define K e (£, k) = £1ii(£/k) + (1-£) ln((l- 
0/(1 - k)). Also let K e (0,0) = K e (l,l) = and K e (£,0) = K e (£,l) = oo 
otherwise. 

Lemma A. 5 For < k < £, K e (£, k) > £1ii(£/k) + k - £. 
Proof. 

A' e (e, K ) = ein(e/«) + (i-^)M(i-o/(i-«)) 
> ein(e/«) + (i-o(i-(i-«)/a-0) 
= eMe/«) + «-e, 

since ln(a;) > (1 — 1/x) for < x < 1. ■ 



Lemma A. 6 LeZ < 

/ n - 
\ 7'ra - 

Proof. Define (ra) ; = n(n — 1) . . . (n — I + 1) (the Z'th falling factorial of n). 
Assume first that kti and ^j'n are integral, and ignore the restriction that k < £. 
The inequality is trivial for £7' > k. 

I (1 — n)n \ I kti \ ( n \ I \ 

" ( 7 l)(^)« 1 -^ (1 " Sh ' n ^ 7 ' n /^ 

< ( 11 ) e -K e (t,«h'n 
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7 < 1 and 7 = |_7^J /n. For < k < £ < 1 (roc? 1 < n, 
- L«nJ \ / [kti\ \ ( n \ Ke (t,«h'n 



where we applied the inequality of Corollary A. 4 and estimated the term involv- 
ing the falling factorials by using the inequality (a — c)/(b — c) < a/b for b > a 
and a > c > 0. 

If kti and ^jn are not integral, the inequality holds with k and £ 
replaced by k' = \_kti\ jn and £' = |"£7'n] . The result follows because the 
exponent on the righthand side of the desired inequality is increasing in k and 
decreasing in £ for k < £. ■ 



A. 2 Cramer's Theorem for the Uniform Distribution 

One of the fundamental results of the theory of large deviations is Cramer's 
theorem. Here we need a version of this theorem for uniformly distributed 
random variables. 

Theorem A. 7 Let X{ be independent and uniformly distributed on [—1, 1] and 
write S n = ^J2i=i^i- Define F(x) = Prob(5 n < x) and let f(x) = F'(x) be 
the density of S n . Let r(x) = — Y\m n ln( f(x))/n. Then the following hold: 

(1) r(x) > 0, r(0) = and r(x) = oo for x ^ ( — 1, 1)- 

(2) r(x) is convex and twice differ entiable on ( — 1, 1). 

(3) For x < 0, F(x) < e~ r ^ n . 

(4) For x < 0, r(x) = — Y\m n ln(F(x))/n. 

(5) There exists c such that for -1< x < 1 f(x) < e -r^)n+cH(\r'(x)\+e)n) _ 

(6) r(x) = — Y\m n ln( f(x))/n. 

Proof. The function r is the rate function. In this case it is obtained as follows. 
Let 

2 

T(t) = lnExp(e tX ') = ln(- sinh(i)). 

The function r(x) is given by r(x) = sup t (te — Since T(t) is smooth 

and strictly convex, r(x) is obtained by first finding t(x) such that T'(t(x)) = 
x and then evaluating r(x) = t(x)x — T(t(x)). By implicit differentiation, 
r'^^a;))^^^) = 1. By strict convexity, T"(t) is never zero, so t is continu- 
ously differentiable on its domain. By taking higher derivatives implicitly, one 
can see that t is in fact smooth (where finite). This implies that r is smooth 
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where finite. Note that r'(x) = t(x). This together with the proof of Cramer's 
theorem found in most textbooks gives (1), (2), (3) and (4) (e.g. [3]). For the 
inequality of (5) observe that f{x) is symmetric and unimodular so that for 
x < and S > 0, F(x) > Sf(x — S). Hence for S < \x\, 

f(x) = f(x + 6-6) 

< \f(x + 6) 
b 

< - P - r { x + s ) n 
~ 6 

If \ x \ > , let 6 = and use convexity of r to see that f(x) < 

\r , (x)\ne- r ^ n+1 < e -^)n+M(\r'(x)\ + e)n) _ Ym | | < 1^ wg uge th result 

on cube slicing in [1] which implies that /(0) < y/n/2. For such x we have 
r(x) < \. Hence f(x) < < e -r(x)n+H(\r'(x)\+e)n) _ For x = 0^ ^ ig trivial5 

and for x > we can use symmetry. Part (6) follows from (4), (5) and the 
observation that for x < 0, F(x) < (1 — x)f(x). ■ 
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